I 


flD~fll60  043  HOW  MANV  BOOTSTRAPS?(U )  STANFORD  UNIV  Cfi  DEPT  OF  i/i 

STATISTICS  R  TIBSHIRflNI  22  BUG  85  TR-362 
N00014-76-C-0475 

UNCLASSIFIED  F/G  1271  NL 


HOW  MANY  BOOTSTRAPS? 


BY 

ROBERT  TIBSHIRANI 


TECHNICAL  REPORT  NO.  362 
AUGUST  22,  1985 


Prepared  Under  Contract 
N00014-76-C-0475  (NR-042-267) 

For  the  Office  of  Naval  Research 

Herbert  Solomon,  Project  Director 

Reproduction  in  Whole  or  in  Part  is  Permitted 
for  any  purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


Efron  (1984),  Section  8  and  are  based  on  standard  results  that  can  be  found 
in  many  books  (e.g.  Kendall  and  Stuart  (1958),  chapter  10). 

2.  The  Bootstrap  method  and  a  Statement  of  the 
Problem. 

Suppose  that  we  have  a  sample  X=(Xj,X2,.Xn  )  ,  with  Xj  assumed  to 
be  i  i  d  from  a  distribution  F.  (Xj  may  be  real  or  vector  valued).  Our 
statistic  of  interest  is  some  symmetric  function  T(  Xj,X2, .%).  We  require 
an  estimate  of  a  functional  Q(T,F,X).  Denoting  the  empirical  distribution 
function  by  P,  the  bootstrap  estimate  is  defined  as  Q(T,P,X).  Usually,  we 
can't  compute  this  analytically  so  we  estimate  it  through  a  monte  carlo 
simulation  This  is  done  by  a)  writing  Q(T/,X)  in  terms  of  quantities  of 
the  form  EpR,  then  b)  estimating  each  quantity  by  8  monte  carlo  estimate 
of  expectation  In  the  case  of  Q(T,F,X)  =  URRpT,  for  example,  we  write 
URRpT=EpT2  -  (EpT)2  We  draw  B  bootstrap  samples  (that  is,  samples  of 
size  N  drawn  with  replacement  from  X;,  X2,..  %)  and  compute  the 
bootstrap  values  T  )*,  T2* ,...  Tg*.  Our  monte  carlo  estimate  of  URRpT  is 
then  ZTi*2/B  -  [I  Tj*/B]2. 

If  Q(T,F,X)  -  Probp(T  >  c),  we  write  Probf(T  >  c)  =  EpI(T  >  c)  and  our 
monte  carlo  estimate  is  *{Tj*  >  c}/6. 

Now  let  Qg(T,f,X)  he  an  approximation  to  Q(T,P,X)  based  on  B  bootstrap 
samples.  As  B  -»  -  we  have,  for  sufficiently  well-behaved  Q,  Qg(T,P,X)  -* 
Q(T/,X).  The  question  we  address  here  is:  how  big  should  B  be  so  that 
QB(T,P,X)  is  (on  the  average)  sufficiently  close  to  Q(T,P,X)? 

The  approach  we  will  take  is  the  following  For  a  specific  Q,  we 
choose  a  measure  of  Qg's  accuracy  in  estimating  Q  .  (This  measure  will  be 
conditional  ori  the  observed  data).  Then  we  take  a  small  number  of 
bootstrap  samples  (say  50  or  100),  and  estimate  the  accuracy  of  C!g(T,P,X). 
If  Qg(T,P,X)  is  not  accurate  enough,  we  take  more  bootstrap  samples  until 
the  estimated  accuracy  is  sufficiently  high. 

In  the  following  sections  we  illustrate  this  for  three  specific  Q's: 
standard  error,  percentile,  and  bias.  In  the  final  section  we  discuss  a 

2 


number  of  points  including  Efron's  (1984)  approach  to  this  problem. 

3.  Number  of  bootstraps  for  standard  error  estimation. 

Here  we  take  Q=es(Varp(T))1/F2.  Let  X|*,  X2*—  Xg*  denote 
bootstrap  samples  generated  from  f.  Then  Qg  =6g,  the  sample  standard 
deviation  of  the  Xj's.  A  reasonable  measure  of  the  accuracy  of  eg  is  its 
coefficient  of  variation  conditional  on  X.  Standard  calculations  show 

CV(eglX)  =  [(S+2)/4B]1/2  +  0<1/B)  (1) 

where  $  is  the  kurtosis  of  the  bootstrap  distribution  of  T.  Table  1  shows 
CV(eg  I  X)  for  s=0  and  8= 3  (the  kurtosis  of  the  t  distribution  on  6  degrees 
of  freedom). 


Table  1. 

CV(eg  |  X)  as  a  function  of  B  and  s 

B  10  20  50  100  200  500  1000 

8=0  .22  .14  .10  .07  .05  .03  .02 

8=3  .35  .25  .16  .11  .08  .05  .04 

We  propose  estimation  of  CV(eg  I X)  from  the  bootstrap  distribution  as 
a  guideline  for  the  choice  of  B.  The  suggested  procedure  is  to  take  say  50 
bootstrap  samples,  estimate  CV(6g  1  X)  from  the  bootstrap  distribution, 
take  more  samples  etc.  until  the  estimated  coefficient  of  variation  is 
small  enough  Note  that  estimation  of  CV(6g  I  X)  requires  an  estimate 
of  8.  For  this  we  use  the  sample  kurtosis  of  the  bootstrap  values.  For 


non-robust  T,  it  would  be  preferable  to  use  a  more  robust  measure 

What  is  a  "small'  coefficient  of  variation?  One  way  to  determine  this 
is  to  examine  the  effect  of  an  error  in  6g  on  the  coverage  of  a  confidence 
interval  of  the  form  T(X)+6Bz(1-0^)  where  z^  is  the 

100o:-th  percentage  point  of  the  standard  normal  distribution.  Let  8  be 
the  parameter  of  F  that  T  estimates  and  let  g(eB)=P(8  e  [TCXbsgz^, 
T(X)+6Bz^"°^J  ) ,  the  coverage  of  the  standard  interval.  A  teylor  series 
argument  gives  for  the  (conditional)  standard  deviation  of  g(sB) 

SD(g(eg)  1  X)  ~  2  v(z(  1  _ot>)2< 1  CV(eB  1  X)  (2) 

This  relationship  is  illustrated  in  Table  2. 


Table  2. 

SD(g(6B)  I  X)  and  CV(&B  IX) 
for  ot=.025  and  .05 


|  CV  , 


SD 

oi=  025 

9. 

it 

O 

cn 

.01 

.04 

.03 

.02 

.09 

.06 

.05 

22 

.15 

Thus  if  we  aim  for  a  90*  confidence  interval  and  we're  willing  to  allow  a 
standard  error  of  7%,  we  require  CV(eB  I  X)  to  be  about  .06.  For  a  95* 
interval,  we  require  CV(eB  I  X)  to  be  .09.  In  the  examples  that  follow,  we 
will  use  .06  as  our  target,  although  this  is  of  course  up  to  the  statistician 
to  choose  in  any  particular  problem. 


Example  1.  The  correlation  coefficient. 

The  data  for  this  example  come  from  the  SAS  Basics  manual  (1962) 
page  510.  They  consist  of  50  measurements  of  chest  and  abdomen  skinfold 
thickness  The  statistic  we  chose  was  the  sample  correlation  coefficient 
which  had  a  value  of  .620  for  the  original  data.  Table  3  shows  the  results 
of  successively  increasing  the  number  of  bootstraps. 

Table  3 

Results  for  the  correlation  coefficient 
applied  to  the  skinfold  data 


B 

S 

6B 

CV 

50 

-11 

.090 

.065 

100 

-.44 

097 

.062 

200 

.13 

097 

052 

500 

-.20 

.100 

.030 

1000 

.25 

.106 

.024 

We  see  that  with  200  bootstraps  the  CV  is  below  .06,  and  even  50  may  be 
adequate. 


Example  2.  Cox's  model. 

In  this  example  we  bootstrapped  Cox's  partial  likelihood  estimate  for 
the  proportional  hazards  model.  The  data  consisted  of  200  measurements 
on  mice  taken  from  Kalbfleisch  end  Prentice  (1981)  pg  233.  The  outcome 
was  survival  in  days,  the  covariate  was  %  antibody  level.  The 


bootstrapping  was  performed  by  treating  the  response,  covariate  and 
censoring  indicator  for  each  mouse  as  the  sampling  unit.  The  partial 
likelihood  estimate  for  the  original  data  was  -.015.  The  bootstrap  results 
ere  shown  in  Table  4. 

Table  4. 

Results  for  Cox's  estimator 
applied  to  mouse  leukemia  data. 


B 

8 

*B 

CV 

50 

3.02 

.010 

.16 

100 

2.04 

.009 

10 

200 

1.75 

.006 

.07 

300 

2.00 

.008 

.06 

500 

1.86 

.008 

.04 

1000 

3.00 

.009 

.04 

About  300  bootstraps  are  necessary  to  get  CV  down  to  about  .06,  despite 
the  fact  that  the  value  of  65  changes  very  little  as  B  increases. 

4.  Number  of  bootstraps  for  percentile  points 

For  the  problem  of  estimating  percentile  points,  the  functional 
Q(T,F,X)  is  G"ko()  where  G  is  the  distribution  of  T  under  F.  The  bootstrap 
estimate  of  G_,(oO  is  6_,(o<)  where  6  is  the  bootstrap  distribution  of 
T(X*)  ,  that  is  the  distribution  of  T(X*)  under  P.  Letting  65  be  the 
empirical  distribution  function  of  J Tg*,  the  monte  carlo 
approximation  to  is  6g-1(o<).  (If  G“ 1  (cx  )  is  not  uniquely  defined, 

we  will  assume  some  reasonable  definition  like  inf {t:  G(t)>  ot}  and 
similarly  for  G" koi)  and  6g" HoO)  Standard  calculations  give 


6 


iB^'Uot)  g(6~ 1  (©{))] 


In  the  above  g(.)  is  d&(t)/dt.  Now  if  S  is  normal,  then  g(6~l(o<))  = 
lz(o{Ve-1(o()  (iptz^)  and  hence  CV|&g_1(of)]  =  [otO-oOlMB^Iz^W  z^)]. 
For  this  case.  Table  5  shows  a  tabulation  of  CVl6g" '  (ot)) . 


CV  [6b 

of 

.75 

B 

50 

.29 

100 

.20 

200 

.14 

500 

.09 

1000 

.06 

Table  5 

(of)]  for  6  normal 

.90 

.95 

.975 

.19 

.18 

.19 

.13 

.13 

.14 

.09 

.09 

.10 

.06 

.06 

.06 

.04 

.04 

.04 

As  we  did  in  the  previous  section,  we  must  address  the  question: 
What  is  a  "small"  coefficient  of  variation  for  this  problem?  Let 

oi=P£ ( T  j  *  <  6g-1(oi)).  We  can  measure  the  size  of  CV  l6g_1(oi)l  by 
assessing  its  effect  on  SD(oi) .  If  G  is  approximately  normal,  it  is  easy 
to  show  that 

SD(S)  s  z(°<>h’(z(o{))  CVlSg-ko!)]  (4) 


Table  6  tabulates  this  for  various  values  of  of  and  SD(o<)  . 


Table  6. 


«•** 


cu[£g 

H«)) 

as  a 

function 

of 

oi 

.75 

.90 

.95  . 

9? 

SD(S) 

.005 

.02 

.02 

.03 

04 

.01 

.05 

.04 

.06  . 

09 

.02 

.09 

.09 

.  12  . 

17 

in 

o 

.23 

.22 

.29  . 

44 

Thus  for  example  if  we  want  to  estimate  the  90th  percentile  with  a 
standard  error  of  01  in  the  coverage,  CUl&g' '(of)]  has  to  be  less  than  or 
equal  to  .04.  Assuming  again  that  6  is  normal.  Table  5  then  tells  us  that 
B  should  be  at  least  1000. 

Instead  of  assuming  normality,  we  can  take  the  approach  discussed  in 
section  2:  estimate  6_1(oi)  and  g(§_1(o<))  from  the  bootstrap  distribution 
arid  thus  get  an  estimate  of  CUl&e'koOl.  We  tried  this  in  the  next  two 
examples,  using  a  kernel  density  estimate  of  the  form  E(  1 /he: g )^((T^- 
y)/h6B).  A  value  of  .5  was  used  for  the  window  parameter  h,  fortunately 
the  results  changed  very  little  when  h  was  varied 


Example  3.  Percentile  points  for  the  setup  of  example  1. 

We  applied  the  bootstrap  to  estimate  a  percentile  of  the  bootstrap 
distribution  for  the  correlation  in  the  ski rif old  data  estimate.  Table  7 
shows  the  results  for  <*=.  10  and  .05. 


— **»-  ^  V  •-£  f  *■  j..  l*i  '.v  > . 


6 


Table  7. 

Percentile  results  for  skinfold  data 


of= .  0  2  5  oi= .  1 0 

cuieB-'(«))  cu[6B-'(«)]  6g_  1  («) 


100 

.014 

.752 

.013 

.711 

6 

200 

.011 

.761 

.011 

.725 

500 

.006 

.772 

.006 

.732 

1000 

.005 

.786 

.005 

.743 

;ee  that  100  bootstraps 

is  adequate 

both  for  ot= . 

10  and 

Example  4.  Percentile  points  for  Example  2  (Cox  model). 


The  results  for  the  Cox  model  applied  to  the  mouse  leukemia  data  are 
shown  in  Table  8. 


Table  8. 

Percentile  results  for  Cox  model 


<*=.025 


In  order  to  get  CV  down  to  .04,  1000  bootstraps  are  needed  for  oi= .  10, 
more  than  1000  bootstraps  are  required  for  <*=.025  .  In  the  second  case, 
S'Ji'oO  isn't  changing  much  but  the  estimates  still  have  large  variability. 


5.  Bias  estimation. 

The  mean  bias  of  the  estimator!  is  defined  as  EpT-B ,  and  is  estimated 
by  the  bootstrap  quantity  E^T(X1*,X2*...Xfj*)-T(X1  ,X2,...X^).  The  monte 
carlo  approximation  to  the  bootstrap  estimate  is  ZTj*/B  -  T(Xj,X2,  XN) 
Assuming  EpT(X^  *,X2*.  X^*)  =  T(X -j  X 2 ,  ^ N ) ,  this  has  coefficent  of 
variation  6/B^|T(Xj,X2, ...X^)l.  Using  eg  as  an  estimate  of  6,  Table  9 
shows  the  CV  as  B  increases  for  the  setup  of  Example  1. 

Table  9. 

CV  for  estimate  of  bias 
(correlation  coefficient,  example  1) 

B  CV 

50  .027 

100  .016 

200  .011 

500  007 

1000  005 

There  seems  to  be  no  natural  way  to  decide  what  a  "small"  CV  for  bias  is, 
one  might  arbitrarily  decide  that  05  is  small.  In  that  case,  as  few  as  50 
bootstraps  is  satisfactory  in  this  example.  For  the  Cox  model  example, 
about  200  bootstraps  are  necessary  to  get  the  CV  down  below  .05. 

Median  bias  is  defined  as  G_  1  (^ ) -8  arid  is  estimated  by 
6_1(^)-T(X1,X2  XN)  The  monte  carlo  approximation  65” 1  ) -T (X -j ,X2  Xjg) 


has  coefficient  of  variation  that  can  be  estimated  by  the  method  described 
in  section  3. 

A  related  problem  is  the  estimation  of  oi=P(T  >  t),  where  of  is  neat 
The  bootstrap  estimate  is  5  -  Pf(Tj*>  t)  and  the  monte  carlo 
approximation  is  o<e=#{Tj*  >  O/B.  The  standard  deviation  of  Sg  is 
fof C  which  equals  1/28^  whenS=^.  in  order  for  the  standard 

deviation  to  get  down  to  .02,  6  must  be  625,  8  standard  deviation  of  .01 
requires  B=2500.  Efron  (1984)  notes  this  fact  in  the  context  of  estimation 
of  the  bias-corrected  percentile  interval  and  suggests  a  better  method  of 
approximation. 

6.  Discussion. 

We  have  presented  here  an  adaptive  method  for  determining  the  number 
of  bootstraps  necessary  to  achieve  a  pre-specified  accuracy.  These 
techniques  should  be  used  with  some  caution.  In  the  first  two  problems 
considered,  estimates  of  kurtosis  and  the  density  of  the  bootstrap 
distribution  were  required.  The  estimation  of  both  these  quantities  is 
quite  del i cate  and  shouldn't  be  attempted  for  bootstrap  sample  sizes  less 
than  50  and  100,  respectively  It  goes  without  saying  that  if  you  can 
feasibly  take  1000  or  more  bootstraps,  then  take  them!  One  reason  is 
that  often  attention  is  not  focussed  on  a  single  aspect  like  variance,  for 
example  the  general  shape  of  the  bootstrap  distribution  may  be  of 
interest.  For  this  it  is  hard  to  quantify  how  large  B  should  be  and  the 
bigger  the  better.  Some  further  remarks: 

Remark  A.  An  unconditional  view  of  the  prod  Jem.  Efron  (1984) 
computes  the  unconditional  coefficient  of  variation  of  eg  and  S_1(<x)  as  a 
way  of  determining  how  large  8  should  be.  He  takes  the  point  of  view  that 
B  is  large  enough  if  the  conditional  variation  in  eg  is  small  compared  to 
its  unconditional  variation,  that  is,  if  increasing  B  doesn't  decrease  the 
unconditional  CV  substantially.  For  standard  error  estimation  he  assumes 


that  E(s)=0  and  shows  that  B  as  smell  as  25  or  50  can  be  adequate.  For 
percentile  estimation,  his  results,  based  on  normal  approximations, 
indicate  that  6=1000  is  necessary.  In  this  paper,  we  have  taken  a  slightly 
dif ferererit  view—  that  even  if  the  sampling  variability  of  the  estimate 
is  large,  one  might  still  want  an  accurate  measure  of  its  standard  error. 
Also  the  adaptive  approach  allows  one  to  take  less  or  more  bootstraps 
depending  on  the  variability  of  the  bootstrap  values  observed. 

Remark  B.  Variance  reduction  techniques  .  The  use  of  monte  carlo 
variance  reduction  techniques  can  greatly  reduce  the  number  of  bootstraps 
necessary  to  achieve  a  given  accuracy.  Therneau  (1983)  looked  at  a 
number  of  methods,  most  notably  control  variates,  for  variance  and  bias 
estimation.  Johns  (1984  personal  communication)  has  had  success  with 
importance  sampling  for  percentile  estimation. 

Remark  C.  Relationship  between  sample  size  and  number  at 
bootstraps.  If  T  =  X,  it  is  straightforward  to  show  that  the 
unconditional  CV  of  6$  is  of  the  form  [a/N  +(b/NB)  +c/B]^,  where  a,  b  and 
c  are  constants  depending  on  F  (the  distribution  of  Xj).  Hence  as  N 
increases,  CV(6B)  goes  down  at  the  rate  1/N^.  This  makes  senses 
intuitively:  as  the  sample  size  increases,  the  kurtosis  of  the  bootstrap 
distribution  of  X*  decreases.  By  linearizing  a  non-linear  statistic,  one 
could  presumably  show  that  this  is  approximately  true  in  general  This 
doesn't  mean,  however,  that  less  computations  are  necessary  since  each 
evaluation  of  the  statistic  will  typically  be  at  least  0(N). 

Remark  D.  The  parametric  and  smooth  bootstraps.  The  techniques 
discussed  in  this  paper  did  not  make  any  special  use  of  the  fact  that  F  was 
estimated  by  the  empirical  distribution  function  P .  Hence  they  are  still 
appropriate  if  some  other  estimate  of  F  is  used,  for  example  a  parametric 
estimate  (“the  parametric  bootstrap")  or  a  semi-parametric  estimate  ("the 
smooth  bootstrap"). 


12 


Acknowledgement :  this  research  was  supported  by  a  Ontario  Ministry  of  Health 
fellowship. 


REFERENCES 


Efron,  B.  (1979).  Bootstrap  methods:  another  look  at  the  jackknife. 
Annals  of  statistics  7,  1-26. 

Efron,  B.  (1964)  Better  bootstrap  confidence  intervals.  Stanford 
University  technical  report  LCS  14. 

Kolbfleisch,  J.  ond  Prentice,  R.  (1980).  The  statistical  analysis  of 
failure  time  data  Wiley,  New  York. 

Kendall,  M.  ond  Stuart,  A.  (1956).  The  advanced  theory  of  statistics. 
Griffen,  London. 

SAS  User's  Guide:  Basics  manual  (1982).  SAS  Institute  Incorporated, 
Cary,  North  Carolina. 

Therneau,  T.  (1963).  Variance  reduction  techniques  for  the  bootstrap. 
Stanford  University  technical  report  200. 


UNCLASSIFIED 


REPORT  DOCUMENTATION  PACE 


I.  IMPORT  NUMO 


«.  TIT kiraUMtttlf) 


How  Many  Bootstraps? 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


■  REORIENT** cataloo  numobr 


».  TVPS  OP  REPORT  *  PKRIOO  COVERED 


TECHNICAL  REPORT 


•.  PtRPORlMMO  ORO.  REPORT  NUMOER 


AUTHORS 

Robert  Tibshirani 


N00014-76-C-0475 


*.  RSRPORMINO  OROANIZATION  name  and  AOORSM 

Department  of  Statistics 
Stanford  University 
Stanford,  CA  94305 


II.  CONTROLLINO  OPPICE  NAME  ANO  AOOREM 


It.  REPORT  DATE 

August  22,  1985 


It.  NUMOER  OP  PAOEt 

15 


NCY  N AME  A  AOOREtVIf  PINarant  timm  Omntrmtt tot  Olttf)  I  It.  SECURITY  CLASS.  (ml  Mm  tm pmM) 


Office  of  Naval  Research 

Statistics  &  Probability  Program  Code  411SP 


IS«.  OECL  ASSI  PIC ATION/  OOWNORAWNO 
SCHEDULE 


IE.  OISTRIOUTION  STATEMENT  (ml  Mia  Kmfmrt) 


APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED 


17.  OISTRIOUTION  STATEMENT  (ml  Ma  aAalracI  anlaraU  In  Blmek  SO,  II  Mtmrmnl  trm m  Kmpmrt) 


It.  KEY  WORDS  (Cmnilm it  an  rmmmtmm  mISm  II  ummmmmmiy  anO  10 anlliy  *7  *!•«*  n»l»r; 

Bootstrap,  monte  carlo  approximation 


SO.  AOSTRACT  fCanl/mM  an  tmmmrmm  alOa  II  naaaaaarr  anO  lOanl/iy  I  r  *»••*  mtmtimr) 

'’In  approximating  bootstrap  quantities  by  monte  carlo  simulation,  one  must  decide 
how  many  bootstrap  samples  to  generate,  ye-  propose  an  adaptive  sequential  method 
that  estimates  the  accuracy  based  on  the  current  bootstrap  samples.  Bootstrap 
sampling  is  continued  until  the  estimated  accuracy  is  high  enough.  In  the  exam¬ 
ples  given,  100  to  300  bootstraps  are  sufficient  for  standard  error  and  bias 
estimation,  while  1000  bootstraps  may  be  necessary  for  estimating  a  percentile. 

_ -  Z' nl&f*  J,  ^ - - 


00  « 1473  * 0WLtT,  unclassified 

*/R  0107  MVI  I  14  SBeuR|TT  ®L  AMI  PIC  ATION  OP  TUI* PAOE 


l-w<h  n*  i 


